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Abstract — Consider the problem of learning the drift coefficient 
of a stochastic differential equation from a sample path. In this 
paper, we assume that the drift is parametrized by a high- 
dimensional vector. We address the question of how long the 
system needs to be observed in order to learn this vector of 
parameters. We prove a general lower bound on this time 
complexity by using a characterization of mutual information 
as time integral of conditional variance, due to Kadota, Zakai, 
and Ziv. This general lower bound is applied to specific classes 
of linear and non-linear stochastic differential equations. In 
the linear case, the problem under consideration is the one of 
learning a matrix of interaction coefficients. We evaluate our 
lower bound for ensembles of sparse and dense random matrices. 
The resulting estimates match the qualitative behavior of upper 
bounds achieved by computationally efficient procedures. 

I. Introduction 

Consider a continuous-time stochastic process {xt}t>Q, that 
is defined by a stochastic differential equation (SDE) of the 
form 



dxt = F{xt;A) dt + dbt 



(1) 



where xt E W, bt is a p-dimensional standard 
Brownian motion and the drift coefficient F{xt;A) = 
[Fi{xt; A), Fp{xt; A)] G MP, is a function of Xt 
parametrized by A, which is an unknown high-dimensional 
vector. 

In this paper we consider the problem of learning informa- 
tion about the vector of parameters A from the observation 
of a sample trajectory X'^ = {xt}J^Q- More precisely, we 
consider the high dimensional case (where the dimensions of 
A and xt are large) and investigate what is the minimum time 
length T we need to observe the system in order to be able 
to recover A, with some confidence. 

Models based on SDE's play a crucial role in several 
domains of science and technology, ranging from chemistry 
to finance. As an example, gene regulatory networks can 
be modeled by systems of non-linear stochastic differential 
equations, whose variables encode concentrations of certain 
gene expression products (e.g. proteins) Complex chem- 
ical networks are also described by SDE's that can involve 
hundreds of reactants ([2], The problem of learning the 
parameters (reaction coefficients) of such an SDE or simply 
reconstructing the underlying network structure (i.e. which 
parameters are non-vanishing) plays crucial role in this context 
0. 



An important subclass of models consists in linear SDE's, 
whereby the drift is a linear function of xt, namely F{xt; A) = 
Axt with A e W^P. This can be a good approximation 
for many systems near a stable equilibrium. Linear SDE's 
are a special case of a broader class for which the drift 
is a linear combination of a finite set of basis functions 
F{xt) = . . . , fn.{xt)]. With : ^ R. The 

drift is then given as F{xt;A) = AF{xt), with A e W"". 
As an example, within models of chemical reactions, the 
drift is a low-degree polynomial. For instance, the reaction 
A + 2B C is modeled as dxc = kc.ABXAXgdt + dbc where 
xa, xb and xq denote the concentration of the species A, 
B and C respectively, and dbc is a noise term affecting the 
measurement of xc- In order to learn a model of this type, one 
can consider a basis of functions that contain all monomials 
up to a maximum degree. 

A. Illustration 

As an illustration, consider a system of p masses in R'^ 
connected by springs. Let C" be the corresponding adjacency 
matrix, i.e. Cj" = 1 if and only if masses i and j are connected, 
and D^j be the rest length of the spring Assuming 
unit masses and unit elastic coefficients, the dynamics of this 
system in the presence of external noisy forces can be modeled 
by the following damped Newton equations 



dvt = - VC/ {qt) dt + cf db^ 

dqt ^ vtdt, 



2 ^ 



where qt = {ql'^\ . . . , q'^^^), Vt 



(2) 
(3) 



and 



q^ ' ,Vf ' G R'' denote the position and velocity of mass i 
at time t. This system of SDE's can be written in the form ([T]) 
by letting xt = [qt,vt] and A = [C^,D'~'] . A straightforward 
calculation shows that the drift F{xt; A) can be further written 
as a linear combination of the following basis of non-linear 
functions 



A 



(4) 



where A^-'-' 



H - (It' - H"' and [p] = {!,..., p}. In many 
situations only specific properties of the parameters are of 




Fig. 1. Evolution of the horizontal component of the position of three masses 
in a system with p = 36 masses interacting via elastic springs (cf. Fig. |2] for 
the network structure). The time interval is here T = 1000. All the springs 
have rest length Dij = 1, the damping coefficient is 7 = 2, cf. Eq. |2j, and 
the noise variance is = 0.25. 




Fig. 2. From left to right and top to bottom: structures reconstructed using 
the algorithm of 15 ] with observation time T = 500, 1500, 2500, 3500 and 
4500. For T = 4500 exact reconstruction is achieved. 



interest, for instance one might be interested only in the 
network structure in the present example. 

Figure [T] shows the trajectories of three masses in a two- 
dimensional network of 36 masses and 90 springs evolving 
according to Eq. (j2]) and Eq. ([3]). How long does one need 
to observe these (and the other masses) trajectories in order 
to learn the structure of the underlying network? Figure 
|2] reproduces the network structure reconstructed using the 
algorithm of Q for increasing observation intervals T. The 
inferred structure converges to the actual one only if T is 
large enough. 

B. Related Work 

Over the last few years, a significant effort has been devoted 
to developing methods and sample complexity bounds for 
learning graphical models from data. Particular effort was 
devoted to learning sparse graphical models using convex 
regularizations that promote sparsity. Well known examples in 
the context of Gaussian graphical models include the graphical 
LASSO |6| and the pseudo-likelihood method of [7J. These 
papers assume that the data are i.i.d. samples from a high- 
dimensional Gaussian distribution. However in many cases 



samples are produced by an underlying dynamical process and 
the i.i.d. assumption is unrealistic. 

In ||5|, a convex regularization method was developed to 
learn linear SDE's with a sparse network structure from data. 
The upper bounds on the sample complexity proved in ||5| 
match in several cases the lower bounds developed here. The 
related topic of learning graphical models for autoregressive 
processes was studied recently in [8], [9]. These papers pro- 
pose a convex relaxation different from the one of |5 1, without 
however developing estimates on the sample complexity for 
model selection. 

Finally, a substantial literature addresses various questions 
related to learning SDE's |[3|, |[T0), |[TT]. However this line 
of work did not yield quantitative estimates on the scaling of 
sample complexity with the problem dimensionality. 

II. Main Results 

Without loss of generality, assume that the parameter ^4 is a 
random variable chosen with some unknown prior distribution 
¥a (subscript will be often omitted). We are interested in a 
specific property of A that is given by a function A i-> AI[A). 
Unless specified otherwise P and E denote probability and 
expectation with respect to the joint law of {xt}t>o and 
A. As mentioned above = {xt}o<t<T will denote the 
trajectory up to time T. Also, we define the variance of a 
vector-valued random variable as the sum of the variances 
over all components, i.e., 

p 

YixTA\x^{Fixt;A)) = ^ Var^|x*(^^.(a^t;^)). (5) 

1=1 

Our main tool is the following general lower bound, that 
follows from an identity between mutual information and the 
integral of conditional variance proved by Kadota, Zakai and 
Ziv |12J. 

Theorem II.l. Let Mt{X^) be an estimator of M{A) based 
on X^. If¥{MT{X^) + M{A)) < \ then 

J, > H{M{A))-2I{A-x^) 

~ ^ ^x^{Y&TA\x^{F{xuA)))dt' 

Proof: Equation ([T]l can be regarded as describing a white 
Gaussian channel with feedback where A denotes the message 
to be transmitted. For this scenario, Kadota et al. 1 12] give the 
following identity for the mutual information between X"^ and 
A when the initial condition is = 0, 

I{X^;A} = ^J^ Ex4y^yA\x4F{xuA))}dt. (7) 

For the general case where a:o 7^ and might depend on A 
(if for example xq is the stationary state of the system) we 
can write I{X'^;A) = I{xo;A) + I {X'^ ; A\xo) and apply 
the previous identity to I{X'^ ; A\xo). Taking into account 
that I{Mt{X'^));M{A)) < I{X^;A) and making use of 
Fano's inequality I{Mt{X^)); M{A)) > F{Mt{X^) = 
M{A))H{Mt{X^))) the results follows. ■ 



The bound in Theorem 11. 1 is often too complex to be 
evaluated. Instead, the following corollary provides a more 
easily computable bound. 

Corollary II.2. Assume that the process {xt\t>o is stationary. 
Let Mt{X'^) be an estimator of M{A) based on X^. If 

¥{Mt{X^) 7^ M{A)) < \ then 

H{M{A))-2I{A;xo) 



T > 



E..o{Var^|,„(^^(xo;^))}' 



(8) 



Proof: Since conditioning reduces variance, we have 
ExdyavA\x4F{xt;A))} < E,,{Var^|,^(^^(a;,; A))}. Us- 
ing stationarity, we have E.j;^{Va,Tji^^^^ {F{xt; A))} = 
Exoly&TAixoiPi^f, A))}, which simplifies (|6]l to ([8]). ■ 
In the rest of this section, we apply this lower bound 
to special classes of SDE's. In all of our applications it is 
understood that the process {xt}t>o is stationary. 

A. Learning Sparse Linear SDE's 
Consider the linear SDE, 



dxt — Axtdt + dbt. 



(9) 



The goal is to learn the interaction matrix A g W^p. The 
first two theorems stated below provide lower bounds for 
sample complexity T, for the two regimes of sparse and 
dense matrices. Throughout this paper Q* will denote the 
transpose of matrix Q. Given a matrix Q, its supp(Q) is 
the — 1 matrix such that supp(Q)ij = 1 if and only if 
Qij 0. Its 'signed support' sign(Q) is the matrix such 
that sign((2)ij = sign{Qij) if Qij ^ and sign{Q)ij = 
otherwise. 

Define the class of matrices A^^^ C by letting A e 

A^^'' if and only if 

(i) A has at most k non-zero elements per row, fc > 3, 

(ii) miiiy \Aij\ > 

(ii) Letting XmmiQ) denote the smallest eigenvalue of matrix 

Q, A„i„(-(A + A*)/2)>p>0. 
The next theorem provides a lower bound on the time com- 
plexity of learning the signed support of models from the class 

Theorem II.3. Let M{A) = sign(A) be the signed support 
of A and Mt{X'^) an estimator of M{A) based on X"^. 
There is a constant C{k) such that, for all p large enough, if 

suPAe^cs) Px^\AiM{A) ^ Mt{X^)) < \ then 

C{k) 



T > 



max{/9/ami„, 1} log(p). 



(10) 



B. Learning Dense Linear SDE's 

A different regime of interest in learning the network of 
interactions for a linear SDE's is the case of dense matrices. 
As we shall see shortly, this regime exhibits fundamentally 
different behavior in terms of sample complexity compared to 
the regime of sparse matrices. 

Let A^^^ C IR^'^^' be the set of matrices with the following 
properties: A e A^^"^ if and only if. 



(i) amin < \ Aij\p^/'^ < flmax- 

(ii) XnU-iA + A*)/2)>p>0. 

The second theorem provides a lower bound for learning the 
signed support of models from class 

Theorem II.4. Let M{A) — sign(A) be the signed support 
of A and AIt{X'^) an estimator of M{A) based on X"^. 
There exists a constant C such that, for all p large enough, if 
snpAeAi^> VxTiAiM{A) ^ MriX^)) < ^ then 



C 

T > max{/9/a,„in, 



(11) 



Together with the upper bounds from |5|, Theorem II. 3 



establishes that the time complexity of learning sparse linear 
SDE's is T = 0(log(p)). Further, this task can be performed 
efficiently using £i penalized least squares |5|. On the other 
hand. Theorem |II.4| implies a dramatic dichotomy. The time 
complexity of learning dense linear SDE's is at least linear in 
p (and indeed matching upper bounds can be proved in this 
case as well [,13J). 

C. Learning Non-Linear SDE's 

In this section we assume that the observed samples X'^ 
come from a stochastic process driven by a general SDE of 
the form ([T}. 

In what follows, i;^*' denotes the i*'* component of vector 
V. For example, 2:2'^' is the 3*'' component of the vector xt at 
time t = 2. JF{ ■ ; A) e W^^ will denote the Jacobian of the 
function F{-]A). 

For fixed L, B and D > Q, define the class of functions 
_4(w) ^ A^^\L,B,D) by letting F{x-A) e A^'^^ if and 
only if 

(i) the support of JF{x; A) has at most k non-zero entries 
for every x, 

(ii) the covariance matrix for the stationary process, Eoo, 
satisfies Amm(Soo) > L, 

(iii) Var,„|^(4'^)<BVi, 

(iv) \dF,{x;A)/dx'^^'>\ < D for all xeRP ij e [p]. 
For simplicity we write F{x; A) £ A'-'^^ by A e A'-^'K 

Theorem II.5. Let M(A) be the smallest support for which 
supp{JF{x; A)) C M{A) Vx. If Mt{X'^) is an estimator 
of M{A) based on X^ and sup^g_^(jv) V xt\a{Mt{X'^) 7^ 
M{A)) <l/2 then 



T > 



k logp/fc — log B/i 
C + 2k^D^B ■ 



(12) 



In the above expression C = maxjg[p] E{Fi{E.j.g\A{^o)i A)}. 

Remark II.l. Note that the assumption that F is Lipschitz 
is not very strong in the sense that it is usually required for 
existence and uniqueness of a solution of the SDE ([TJ with 
finite expected energy, [14j. 



III. Proofs and technical lemmas 



In this section we prove Theorems II. 3 to II. 5 Throughout, 
{xt}t>() is assumed to be a stationary process. It is immediate 



to check that under the assumptions of the Theorems II. 3 and 



II. 4 the SDE admit a unique stationary measure, with bounded 
covariance. We let Eoo = E{a;oa;5} - E{a;o}(E{a:o})* = 
E{xtXf} — E{xt}(E{xt})* denote this covariance. 

A. A general bound for linear SDE's 

Before passing to the actual proofs, it is useful to establish a 
general bound for linear SDE's (j9]l with symmetric interaction 
matrix A. 

Lemma III.l. Assume that {xt\t>o <^ stationary pro- 
cess ^nerated by the linear SDE ([9]), with A symmetric. 
Let Mt{X'^) be an estimator of AI{A) based on X^. If 
¥{Mt{X^) + M{A)) < \ then 

H{M{A))^2I{A;xo) 



T > 



(13) 



iTr{E{-A}-(E 

Proof: The bound follows from Corollary |n.2| after 
showing that E,„{YaTA\xoiAxo)) < (l/2)Tr{E{-A} - 
(E First note that 

Ex„{Var^|,„(Aa;o)} = E^JjAxo - Ea\x^{Axo\xo)\\1 (14) 

The quantity in ([14]) can be thought of as the ^2-norm error 
of estimating Axq based on xq, using E^i^,^ (Axola^o)- Since 
conditional expectation is the minimal mean square error 
estimator, replacing E^|2,g(Aa;o|a;o) by any estimator of Axq 
based on Xq gives an upper bound for the expression in ( [T4] i. 
We choose as an estimator a linear estimator , i.e., an estimator 
in the form Bxq where B = (E^Ai;oo)(E^Soo)"\ 

E^J\Axo - EA\a^^{AxQ\xQ)\\l < E.^„\\Axo - BxqWI 
= TrlEiAxoixoTA*}} - 2Tr{BE{xo(a;o)M*}} 
+ TT{BE{xo{x„r}B*}. (15) 

Furthermore, for a linear system, Eoo satisfies the Lyapunov 
equation AT, ao+'^oo A* + 1 = 0. For A symmetric, this implies 
Sfx) = —{1/2)A^^. Substituting this expression in ([14]) and 
([TSjl finishes the proof. ■ 



B. Proof of Theorem II. 3 



We prove the theorem by showing that the same complexity 
bound holds in the case when we are trying to estimate the 
signed support of A for an A that is uniformly randomly 
chosen with a distribution supported on A^^'^ and we simulta- 
neously require that the average probability of error is smaller 
than 1/2. This guarantees that unless the bound holds, there 
will exist A e ^^'^^ for which the probability of error is biger 
than 1/2. The complexity bound for random matrices A is 



proved using Lemma III.l 



In order to generate A at random we proceed as follows. 
Let G be the a random matrix constructed from the adjacency 
matrix of a uniformly random fc-regular graph. Generate A by 
flipping the sign of each non-zero entry in G with probability 
1/2 independently. We define A to be the random matrix A = 



-(7 + 2a^in^/k -1)1 + a^ninA where 7 = 7(A) > is 
the smallest value such that the maximum eigenvalue of A is 
smaller than —p. This guarantees that all these A satisfy the 
four properties of the class A''^\ 

The following lemma encapsulates the necessary random 
matrix calculations. 

Lemma 111.2. Let A be a random matrix defined as above 
and 

Q{a^^,k,p)= lim -{Tr{E(-A)}-Tr{(E(-yl-i))-i}}. 

(16) 

Then, there exists a constant G' only dependent on k such that 



Q[amin,k,p) < mm{- 



(17) 



p ' ^/k^- 
Proof: First notice that 

lim -ETr{-A} = lim E(7) + 2a„u„V'fc - 1 (18) 

p— ^00 p p— J-OO 

= p + 2a,ni„Vfc - 1 (19) 

since by Kesten-McKay law JTS), for large p, the spectrum of 
A has support in (— e — 2a,„in\/fc — 1, "^ciminVk — 1 + e) with 
high probability. Notice that unless we randomize each entry 
of A with {—1, +1} values, every A will have k as its largest 
eigenvalue and the above limit will not hold. 

For the second term we will compute a lower bound. For 
that purpose let Ai > be the i*'^ eigenvalue of the matrix 
E(-A"^). We can write. 



lTr{{E{-A-^))-^}^lj2T 



P 
> 



^ i=l ' 



^Er=iA. E{iTr{(-A)-i}} 



(20) 
(21) 



where we applied Jensen's inequality in the last step. By 
Kesten-McKay law we now have that, 

lim E{-Tr{{-A)-^}} = E{ lim -Ti-{{-A)-^}} (22) 

p-^oo p p^oo p 



— G(k,p/an,in + 2Vk-l) 



where 



and 



Gik,z) 



-dp.{v) 



2tt 



k^ — v'^ 



(23) 



(24) 



(25) 



for V e \—2\Jk — 1, —2\Jk — 1] and zero otherwise. Expres- 
sion ( |25] l defines the Kesten-McKay distribution. Computing 
the above integral we obtain 



(26. 



whence 



lim Q(amin,fc,p) - t' 

p^o — 1 

lim pQ{an,in,k,p) ^ A;(a,„i„)' 

p— >-oo 



(27) 
(28) 



Since Q(ai„in, fc, p)/ami„ is a function of fc and p/flmin that 
is strictly decreasing with p/umin, the claimed bound follows. 



Proof (Theorem 11.3 ): Starting from the bound of Lemma 



|III.1| we divide both terms in the numerator and the denom- 
inator by p. The term H{M{A))/p can be low er bou nded 
by p-^ log {{l)2''Y > klog{2p/k) and Lemma 
an upper bound on the denominator when p 
now prove that Mmp^oo I{xo; A)/p < 1 



ni.2 



gives 
oo. We 
This finishes the 
proof of Theorem |II.3| since after multiplying by a small 
enough constant (only dependent on k) the bound obtained 
by replacing the numerator and denominator with these limits 
will be valid for all p large enough. 
We start by writing. 



Iixo;A) = h{xo) - h{xo\A) 



(29) 



< i log(2^e)P|E(Eo,)| - log(27re)P|E^|, (30) 

where Eqo = — (1/2)A~^ is the covariance matrix of the 
stationary process Xt and |.| denotes the determinant of a 
matrix. Then we write, 

I{x^-A) < ilog|E(-(/3A)-i)| + ^Elog(|-/3^|) (31) 

< ^TrE(-/ - il3A)-') + ^ETr{-/ - ^A} (32) 

where /3 > is an arbitrary rescaling factor and the last 
inequality follows from log(/ + A/) < Tr(Af ). From this and 
equations ( fTS] ! and (|22]) it follows that, 

lim -I{xo; A) < -1 + (l/2)(/3'z + l3'-^G{k, z)) (33) 

fcjo p 



where z — p/anun + 2\Jk — 1 and = f3 a j-nin- Optimizing 
over /?' and then over z gives. 



P'z + /3'~^G(fc, z) < 2y/zG{k,z) < Vs^ 
which implies limp^oo I{xo;A)/p < 1. 



k-1 

k-2 



<4, (34) 



C. Proof of Theorem 11.4 ■ Outline 



The proof of this theorem follows closely the proof of 
Theorem |II.3| We will prove that same bound ( [TT| holds for 
an A chosen at random with a distribution supported on A''^\ 
whence the claim follows. In order to lower bound the error 



probability for random matrices, we make use of Lemma III. 1 
We construct the random matrix A as follows. Let A 
be a random symmetric matrix with {Aij}i<j i.i.d. random 
variables where V{Aij = amin) = f'{Aij — — flmin) — 1/4, 
and V{Aij = 0) = 1/2. Notice that the second moment 
of each entry is E(y4|^) = a^-^^^/2 = a. We then define 
A — —(7 + 2^/a)l + Aj ^ where 7 = 7(^) is the smallest 
value that guarantees that Ainin(— ^) > P- 



D. Proof of Theorem II. 5 

The proof consists in evaluating the lower bound in Corol- 



lary II. 2 We again prove the theorem by showing for a random 
class of functions contained in A''-^\ 

We consider a the set of functions such that for each possible 
support of a p by ]3 matrix with at most k non-zero entries per 
row. Assume there is one and only one function in the family 
with JF having that support for all x. 

Now notice that E^^Yar^^^AFixo; A) < E{\\F{xo; A)\\^). 
Secondly notice that, if x and x' only differ on the j*'* 
component and (JF)y ^ then \Ft{x;A)\ < \F,{x';A)\ + 
D\\x' — a;||. Since JF has at most k non-zero entries per 
row, we get that for any x and x', \Fi{x; A)\ < \Fi{x'; A)\ + 



kDWx' 



If X — xq and x' 



^x„\Aixo\A) then 



squaring the previous expression and taking expectations gives 
us E^^|a(F,(x; < 2Fi{x';A)^ + 2k'^D'^B. From this 

we get that E{\\F{xo;A)\\^)/p < C + 2k^D^B where C 
is a constant independent of A. For this sub family of 
functions we have H{M{A)) > pk\og{p/k). By (|29| and 
^ we know that I{xo;A) < (1/2) log((27re)J'|ESoo|) - 
(l/2)Elog((27re)?'|Eoo|). The first term, which is the entropy 
of a p-dimensional Gaussian with covariance matrix EEqo, 
can be upper bounded by the sum of the entropy of its 
individual components, which have variance upper bounded by 
B. Finally, since Amin(Soo) > L, we have log jEool > P^ogL 
and therefore I{xo;A) < p/2\ogB/L, which completes the 
proof. ■ 
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Appendix 



A. Proof of Theorem U.4 



The following Lemma contains a matrix theory calculation 
that will be later used in this proof when applying Lemma 
III.l Recall that we defined a 



•^min/^- 



Lemma A.l. Let Abe a random matrix defined as above and 
Q{a^i^,p)= lim -{Tr{E(-A)}-Tr{(E(-A-i))-i}}. 

p— >oo p 

(35) 

Then, there exists a constant C such that 



p) < min{ 



-}■ 



(36) 



2p ' V2 

Proof: Using Wigner's Semicircle law for random sym- 
metric matrices [16J and the bound described in (|20| it follows 
that, 

^-{Tr{E(-A)} = p + 



lim 

p— >oo p 



V 



lim E{-Tr{(-A)-i}} 



P) 



2a 



(37) 
(38) 

(39) 



Since C{a,p) ~ a^^/^C(l, p/^/a) we can write p + 2^/a — 
{C{a^ p))^^ = ^J~olG{p I ^/a) where G(x) is a strictly de- 
creasing function. Since limp_j.o = ^/aG{p/ -Ja) = ^fot 



and lim 



p— f oo 



p^G{p/^) 



a it follows that there is a 



constant G' independent of a or p such that y/aG{p/a) < 
^/Q. mm{l , C' y/a / p} . The result now follows by replacing 



72. 



Proof (Theorem U.4\: Like in the proof of Theorem 



^13 



we start by dividing both numerator and denominator 



T3| in Lemma III.l by p. By multiplying the resulting 



of 

expression by an appropriately small constant we can replace 
the denominator and limp^oo -^(2^0; A)/p by their limits when 
p — > 00 and get an expression that is still valid for all p 
large enough. Since H{M{A))/p = log 4, and since by 

Lemma |A.1| we akeady know the limiting expression of the 
denominator, all we have to do is find Imip^ao A) /p. 
By an analysis very similar to that in the proof of Theorem 
III.3l one can show that 



lim -I{xq- A)<-1 + v/(z + 2)C(l,z) < 1. 

p-^oo p 



(40) 



where G{a,p) was defined in ( |38l ), which finishes the proof. 



